Show the code
import pandas as pd
import numpy as np
from lets_plot import *
LetsPlot.setup_html(isolated_frame=True)import pandas as pd
import numpy as np
from lets_plot import *
LetsPlot.setup_html(isolated_frame=True)For Project 1 the answer to each question should include a chart and a written response. The years labels on your charts should not include a comma. At least two of your charts must include reference marks.
# Learn morea about Code Cells: https://quarto.org/docs/reference/cells/cells-jupyter.html
# Include and execute your code here
df = pd.read_csv("https://raw.githubusercontent.com/byuidatascience/data4names/master/data-raw/names_year/names_year.csv")How does your name at your birth year compare to its use historically?
The name Dylan became popular in the 1990s and then starting experiencing a drop off at the year 2000 and as of present is starting to stabalize.
# Include and execute your code here
myName = "Dylan"
birthYear = 2004
df_name = df[df["name"] == myName]
birth_year_data = df_name[df_name["year"] == birthYear]
p = (ggplot(df_name) +
geom_line(aes(x='year', y='Total'), color="blue") +
geom_point(aes(x='year', y='Total'), data=birth_year_data, color="red", size=5) +
ggtitle(f"Popularity of the Name '{myName}' Over Time") +
xlab("Year") + ylab(f"Number of Babies Named {myName}") +
ggsize(400, 200))
p.show()If you talked to someone named Brittany on the phone, what is your guess of his or her age? What ages would you not guess?
The name was very popular in the 1990s for a short time then quickly died out again.
# Include and execute your code here
test_name = "Brittany"
df_name = df[df["name"] == test_name]
max_point = df_name[df_name["Total"] == df_name["Total"].max()]
min_point = df_name[df_name["Total"] == df_name["Total"].min()]
max_year = df_name[df_name["Total"] == df_name["Total"].max()]["year"].values[0]
min_year = df_name[df_name["Total"] == df_name["Total"].min()]["year"].values[0]
p = (ggplot(df_name) +
geom_line(aes(x='year', y='Total'), color="blue") +
geom_point(aes(x='year', y='Total'), data=max_point, color="green", size=5) +
geom_point(aes(x='year', y='Total'), data=min_point, color="red", size=5) +
ggtitle(f"Popularity of the Name '{test_name}' Over Time") +
xlab("Year") + ylab(f"Number of Babies Named {test_name}") +
ggsize(400, 200))
p.show()
print(f"Most likely {max_year}, least likely {min_year}.")Most likely 1990, least likely 1968.
Mary, Martha, Peter, and Paul are all Christian names. From 1920 - 2000, compare the name usage of each of the four names in a single chart. What trends do you notice?
After the year 1975 the use of these names significantly decreased.
# Include and execute your code here
names = ["Mary", "Martha", "Peter", "Paul"]
df_filtered = df[(df["name"].isin(names)) & (df["year"] >= 1920) & (df["year"] <= 2000)]
p = (ggplot(df_filtered) +
geom_line(aes(x='year', y='Total', color='name')) +
ggtitle("Usage of Christian Names (Mary, Martha, Peter, Paul) Over Time") +
xlab("Year") + ylab("Number of Babies Named") +
ggsize(400, 200))
p.show()Think of a unique name from a famous movie. Plot the usage of that name and see how changes line up with the movie release. Does it look like the movie had an effect on usage?
The use of the name Maverick slightly went up after the release of the movie but went up significantly more after the sequel Top Gun: Maverick was released
# Include and execute your code here
movie_name = "Maverick"
movie_release_year = 1986
df_name = df[df["name"] == movie_name]
p = (ggplot(df_name) +
geom_line(aes(x='year', y='Total'), color='blue') +
geom_vline(xintercept=movie_release_year, linetype="dashed", color="red", label="hello") +
ggtitle(f"Usage of the Name '{movie_name}' Over Time") +
xlab("Year") + ylab(f"Number of Babies Named {movie_name}") +
ggsize(400, 200))
p.show()Reproduce the chart Elliot using the data from the names_year.csv file.
type your results and analysis here
# Include and execute your code here